rm(list=ls())
# Load R package for printing
library(knitr)
Aim
To implement basic aerial data modeling and analysis in R computing environment by using the R packages spdep and spatialreg.
Reading material
Lecture notes and bibliography:
References for R, Rmarkdown, and LaTeX (optional, supplementary material):
New R packages
spdep: Spatial Dependence: Weighting Schemes, Statistics
The R package spdep contains a number of functions to deal with spatial dependence structures. Some of its functions can be used to construct spatial neighborhood matrices (weight matrices), perform tests for spatial autocorrelation (e.g., Moran test), evaluate auto regressive models (e.g., SAR, CAR), and perform spatial autocorrelation analyses.
spatialreg: Spatial Regression Analysis
The R package spatialreg is a collection of all the estimation functions for spatial cross-sectional models (on lattice/areal data using spatial weights matrices) contained up to now in spdep. These model fitting functions include maximum likelihood methods, and others.
Datasets involved
New York leukemia data set
Source: nydata{spData}
Format: data frame with 281 observations on the following 12 variables, binary coded spatial weights
Coordinate variables: “X”, “Y”
Responses: “Z”
Covariates: “TRACTCAS”, “PCTAGE65P”, “PEXPOSURE”
Columbus OH spatial analysis data set
Source: spData{spData}
Format: data frame has 49 rows and 22 columns. Unit of analysis: 49 neighbourhoods in Columbus, OH, 1980 data. Includes a polylist object polys with the boundaries of the neighbourhoods, a matrix of polygon centroids coords.
Coordinate variables: “X”, “Y”
Responses: “HOVAL”, “INC”, “CRIME”
Covariates: “AREA”
The following script loads the dataset nydata {spData} and plots the map of the transformed proportions stored in variable Z.
rm(list=ls())
# load the data
library(spData)
library(sf)
# read the data
NY8.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData"),
quiet=TRUE
)
# plot
library(tmap)
NY8.map <- NY8.data
tm_shape(NY8.map) +
tm_polygons("Z")
Task
Load the dataset columbus {spData} and plot a map of the variable CRIME.
rm(list=ls())
#######################
# Write your code below
#######################
First we describe how to specify/code the spatial neighborhood structure (Neighbors list).
Then we describe how to specify/code the associated proximity matrix (Spatial weights for neighbors lists).
Neighbors based on contiguity are constructed by assuming that neighbors of a given area are other areas that share a common boundary.
Function poly2nb{spdep} can create a neighbors list based on regions with contiguous boundaries, (namely sharing one or more boundary point).
Important input arguments:
pl=: an sf object containing non-empty (multi-)polygon objects or list of polygons
queen=: if TRUE, a single shared boundary point meets the contiguity condition (queen moves in chess)
The following script
loads nydata {spData}
creates a working copy nydata.map of the dataset,
creates a neighbors list of the neighbors based on contiguity,
prints the neighbors of the first \(3\) locations in the list of polygons.
plots the boarders and the neighbor list as a network
rm(list=ls())
# 0 load the data
library(spData)
library(sf)
nydata.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData"),
quiet=TRUE
)
# 1 Create a working copy of the spatial data set
nydata.map <- nydata.data
# 2 Compute the neighbors list
library(spdep)
nydata.nb <- spdep::poly2nb(pl = nydata.map, # list of polygons
queen = TRUE # bounds
)
# 3 Print the neighbors of the first 3 locations in the list of polygones
nydata.nb[1:3]
## [[1]]
## [1] 2 13 14 15 47 48 49 50
##
## [[2]]
## [1] 1 3 13 35 47 48
##
## [[3]]
## [1] 2 4 5 12 13 35
# 4 plots the boarders and the neighbor list as a network
## Plot the borders
plot(st_geometry(nydata.map), # object of class sf
border = "black" # color of polygon border(s); using NA hides them
)
## Plot the neighbors (superimpose them to the previous plot)
plot.nb(x = nydata.nb, # an object of class nb
coords = st_geometry(nydata.map), # matrix of region point coordinates, a Spatial object (points or polygons),
col = "red",
add = TRUE # superimpose them to the previous plot
)
Task
Use columbus {spData} .
Create a neighbors list of the neighbors based on contiguity.
print the neighbors of the first \(3\) locations in the list of polygons.
plot the boarders and the neighbor list as a network
rm(list=ls())
# 0 load the data
library(spData)
library(sf)
columbus.data <- st_read(
system.file("shapes/columbus.shp", package="spData"),
quiet=TRUE
)
# #####################
# Write your code below
# #####################
Neighbors based on \(k\) nearest neighbors are constructed by assuming that neighbors of a given area are other areas with the smallest \(k\) cendroid distances.
It requires combined use of functions knearneigh{spdep} and knn2nb{spdep}.
Function st_centroid {sf} gets the centroids of the aerial unites.
Function knearneigh{spdep} returns a matrix with the indices of points belonging to the set of the k nearest neighbours of each other.
x=: an sf object containing non-empty (multi-)polygon objects or list of polygons
k=: number of nearest neighbors to be returned; where identical points are present
Function knn2nb{spdep} converts a knn object returned by knearneigh into a neighbours list of class nb with a list of integer vectors containing neighbour region number ids.
The following script:
creates a \(4\) nearest neighbors list
prints the neighbors of the first \(3\) locations in the list of polygons.
plots the boarders and the neighbor list as a network
rm(list=ls())
# load the data
library(spData)
library(sf)
nydata.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData"),
quiet=TRUE
)
# 1 creates a $k$ nearest neighbors list
nydata.map <- nydata.data
library(spdep)
## get the centroids
coo <- st_centroid(x = nydata.map # object of class sfg, sfc or sf
)
## Compute the neighbors list
nydata.nb <- knn2nb(
knearneigh(coo, # centroids
k = 4 # number nearest neighbors
)
)
# 2 print the neighbors of the first 3 locations
nydata.nb[1:3]
## [[1]]
## [1] 2 13 15 49
##
## [[2]]
## [1] 1 3 13 15
##
## [[3]]
## [1] 2 5 13 35
# 3 plot the boarders and the neighbor list as a network
plot(st_geometry(nydata.map), # object of class sf
border = "black" # color of polygon border(s); using NA hides them
)
plot.nb(x = nydata.nb, # an object of class nb
coords = st_geometry(nydata.map), # matrix of region point coordinates, a Spatial object (points or polygons),
col = "red",
add = TRUE # superimpose them to the previous plot
)
Task
Use columbus {spData} .
Create a \(k=2\) nearest neighbors list of the neighbors.
print the neighbors of the first \(3\) locations in the list of polygons.
plot the boarders and the neighbor list as a network
rm(list=ls())
# load the data
library(spData)
library(sf)
columbus.data <- st_read(
system.file("shapes/columbus.shp", package="spData"),
quiet=TRUE
)
#######################
# Write your code below
#######################
A Proximity matrix (spatial neighborhood matrix) \(W\) defines a neighborhood structure over the entire study region, and its elements can be viewed as weights.
The \((i,j)\)th element of \(W\), denoted by \(w_{ij}\), spatially connects areas \(i\) and \(j\) in some fashion.
Customarily, \(w_{ii}\) is set to \(0\) for \(i=1,...,n\).
More weight is associated with areas closer to \(i\) than those farther away from \(i\).
If neighbors are based on contiguity, we can construct a binary spatial matrix with \(w_{ij}=1\) if regions \(i\) and \(j\) share a common boundary, and \(w_{ij}=0\) otherwise.
Function nb2listw{spdep} construct a spatial neighborhood matrix containing the spatial weights corresponding to a neighbors list.
Important input arguments
neighbours=: list with neighbors, an object of class nb
style=: indicates the coding scheme chosen.
“B” is the basic binary coding,
“W” is row standardised (sums over all links to n) (1/number of neighbors),
“C” is globally standardised (sums over all links to n)
glist: (optional) list of general weights corresponding to neighbours. If “glist=NULL” or ignored, then the weights are 0 or 1.
The following code
creates a neighborhood list based on \(4\) nearest neighbors,
creates the proximity matrix nydata.nbw as a basic binary coding.
rm(list=ls())
# load the data
library(spData)
library(sf)
nydata.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData"),
quiet=TRUE
)
# Create a working copy of the spatial data set
nydata.map <- nydata.data
# 1 create the list of neighbors based on 4 nearest neighbors
library(spdep)
nydata.nb <- knn2nb(
knearneigh(x = st_centroid(nydata.map), # centroids
k = 4 # number nearest neighbors
)
)
# 2 Create the proximity matrix
nydata.nbw <- nb2listw(neighbours = nydata.nb, # list of neighbors
style = "B" # coding style
)
# 3 Printinfs
nydata.nbw$weights[1:3]
## [[1]]
## [1] 1 1 1 1
##
## [[2]]
## [1] 1 1 1 1
##
## [[3]]
## [1] 1 1 1 1
Task
Use columbus {spData} .
Create a neighborhood structure based on 2 nearest neighbors
Create a proximity matrix as a row standardized coding by using function nb2listw {spdep}
rm(list=ls())
# load the data
library(spData)
library(sf)
columbus.data <- st_read(
system.file("shapes/columbus.shp", package="spData"),
quiet=TRUE
)
#######################
# Write your code below
#######################
# Create a working copy of the spatial data set
columbus.map <- columbus.data
# 1 create the list of neighbors based on 2 nearest neighbors
library(spdep)
columbus.nb <- knn2nb(
knearneigh(x = st_centroid(columbus.map), # centroids
k = 2 # number nearest neighbors
)
)
# 2 Create the spatial neighborhood matrix
columbus.nbw <- nb2listw(neighbours = columbus.nb, # list of neighbors
style = "W" # coding style
)
# 3 Print
columbus.nbw$weights[1:3]
## [[1]]
## [1] 0.5 0.5
##
## [[2]]
## [1] 0.5 0.5
##
## [[3]]
## [1] 0.5 0.5
Spatial autocorrelation is used to describe the extent to which a variable is correlated with itself through space.
Positive spatial autocorrelation occurs when observations with similar values are closer together (i.e., clustered).
Negative spatial autocorrelation occurs when observations with dissimilar values are closer together (i.e., dispersed).
Moran’s \(I\) scatterplot aims to visualize the spatial autocorrelation in the data.
It displays the observations of each area against its spatially lagged values. The spatially lagged value for a given area is calculated as a weighted average of the neighboring values for that area.
We can see a positive relation (positive slop), a negative relation (negative slope), or no relation (no slope).
The following script loads the data, creates a neighborhood structure and creates a proximity matrix.
rm(list = ls())
# libraries
library(spdep)
# get the data
NY8.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData")[1],
quiet=TRUE
)
# make a fresh copy of the data in a working object
NY8.map <- NY8.data
# build a neighbours list based on regions with contiguous boundaries
NY8.nb <- poly2nb(pl = NY8.data # list of polygons
)
# supplements a neighbours list with spatial weights for the chosen coding scheme
NY8.listw <- nb2listw(neighbours = NY8.nb, # an object of class nb
style="W" # style can take values “W”, “B”, “C”, “U”, “minmax” and “S”
)
Function lag.listw{spdep} computes the lag vector by using as input a listw sparse representation of a spatial weights matrix
Important input arguments
x= a listw object created for example by nb2listw
var= a numeric vector the same length as the neighbours list in listw
Important outputs:
lagCases: spatially lagged value
Cases: Cases
The following script computes the spatially lagged value’s for each of the locations. See that I append the output from lag.listw() in the working dataset NY8.map for convenience.
# Compute the spatially lagged value's for each of the locations
lagCases = lag.listw(x = NY8.listw, # a listw object created for example by nb2listw
var = NY8.data$Cases # a numeric vector the same length as the neighbours list in listw
)
NY8.map$lagCases <- lagCases
# Plot
plot(x = NY8.map$lagCases, # standardized values of lagged Cases
y = NY8.map$Cases, # standardized values of Cases
main = "Moran’s I scatterplot",
xlab = "variable",
ylab = "spatialy lagged variable"
)
cor(NY8.map$lagCases, NY8.map$Cases)
## [1] 0.284358
Task
Use columbus{spData}, and consider the variable CRIME.
Produce the Morgan’s I scatter plot by editing the following script.
Is there any evidence of spatial autocorrelation?
rm(list=ls())
# load the data
library(spData)
library(sf)
columbus.data <- st_read(
system.file("shapes/columbus.shp", package="spData"),
quiet=TRUE
)
# Create a working copy of the spatial data set
columbus.map <- columbus.data
# build a neighbours list based on regions with contiguous boundaries
columbus.nb <- poly2nb(pl = columbus.map # list of polygons
)
# supplements a neighbours list with spatial weights for the chosen coding scheme
columbus.listw <- nb2listw(neighbours = columbus.nb, # an object of class nb
style="W" # style can take values “W”, “B”, “C”, “U”, “minmax” and “S”
)
#######################
# Write your code below
#######################
The statistic (Global Moran’s I) is
\[ I = \frac{n \sum_i \sum_j w_{ij}(Z_i - \bar Z)(Z_j - \bar Z)} {(\sum_{i \neq j} w_{ij}) \sum_i (Z_i - \bar Z)^2} \]
where \(n\) is the number of regions, \(Z_i\) is the observed value of the variable of interest in region \(i\), and \(\bar Z\) is the mean of all values. \(w_{ij}\) are spatial weights that denote the spatial proximity between regions \(i\) and \(j\), with \(w_{ii}=0\) and \(i,j=1,…,n\).
Often, we consider its \(z\) (standardized) statistic \[z = \frac{I-E(I)}{Var(I)^{1/2}},\] which has asymptotic standard Normal distribution; here
\[E[I] = \frac{-1}{n-1}\]
and
\[Var[I] = \frac{n^2(n-1)S_1 - n(n-1)S_2 - 2 S_0^2}{(n+1)(n-1)^2 S_0^2}\]
with
\[S_0 = \sum_{i \neq j} w_{ij},\ S_1= \frac{1}{2}\sum_{i\neq j} (w_{ij}+w_{ji})^2\mbox{ and } S_2 = \sum_k \left(\sum_j w_{kj}+\sum_i w_{ik}\right)^2.\]
The usual hypothesis we test involve statements:
\(I = E[I]\) , for no spatial autocorrelation
\(I \ne E[I]\) , for spatial autocorrelation
\(I < E[I]\) , for negative spatial autocorrelation
\(I < E[I]\) , positive spatial autocorrelation
Function moran.test{spdep} performs the Global Moran’s I hypothesis test
Important input arguments:
x=, a numeric vector the same length as the neighbours list in listw
listw= a listw object created for example by nb2listw
***alternative=“*** a character string specifying the alternative hypothesis, must be one of”greater”, “less” or “two.sided”.
The following script loads the data and creates a proximity matrix.
rm(list = ls())
# libraries
library(spdep)
# get the data
NY8.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData")[1],
quiet=TRUE
)
NY8.map <- NY8.data
# build a neighbours list based on regions with contiguous boundaries
NY8.nb <- poly2nb(pl = NY8.data # list of polygons
)
# supplements a neighbors list with spatial weights for the chosen coding scheme (use style="B" for binary)
NY8.listw <- nb2listw(neighbours = NY8.nb, # an object of class nb
style = "B" # style can take values “W”, “B”, “C”, “U”, “minmax” and “S”
)
The following script
tests the null hypothesis
\[
\text{H}_{0}: \text{ there is no spatial autocorrelation}
\]
against the alternative hypothesis
\[
\text{H}_{\text{a}}: \text{ there is spatial autocorrelation}
\]
Prints the summary. of the result
# 1
NY8.data.gmoran.test <- moran.test(NY8.data$Cases, # the data as a numeric vector the same length as the neighbours list in listw
NY8.listw, # a listw object created for example by nb2listw
alternative = "two.side" # one of "greater", "smaller", "two.side"
)
# 2
print(NY8.data.gmoran.test)
##
## Moran I test under randomisation
##
## data: NY8.data$Cases
## weights: NY8.listw
##
## Moran I statistic standard deviate = 3.9148, p-value = 9.049e-05
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.131929546 -0.003571429 0.001198042
There is a significant auto-correlation at sig. level 5%. Of course we need to check for the validity of the assumptions: Normality, homogeneity, and no systematic dependence due to third variables for Cases. However, we omit them here for the shake of presentation.
Task
Consider the data set columbus{spData}, and particularly the variable CRIME.
For the variable the variable CRIME
test the null hypothesis \[
\text{H}_{0}: \text{ there is no spatial autocorrelation}
\] against the alternative hypothesis
\[
\text{H}_{\text{a}}: \text{ there is posiive spatial autocorrelation}
\]
Is there enough evidence of positive autocorrelation at sig. level 5%?
rm(list=ls())
library(spData)
library(sf)
columbus.data <- st_read(
system.file("shapes/columbus.shp", package="spData"),
quiet=TRUE
)
columbus.map <- columbus.data
columbus.nb <- poly2nb(pl = columbus.data )
columbus.listw <- nb2listw(neighbours = columbus.nb,
style = "B"
)
#######################
# Write your code below
#######################
There is often interest in providing a local measure of similarity between each (local) area’s value and those of nearby areas.
Local Indicators of Spatial Association (LISA) (Anselin 1995) are designed to provide an indication of the extent of significant spatial clustering of similar values around each observation.
The local index \(\left\{I_{i}\right\}\) for the \(i\)-th region is
\[ I_i = \frac{n (Y_i - \bar Y)}{\sum_j (Y_j - \bar Y)^2} \sum_j w_{ij}(Y_j - \bar Y). \]
which compose global Moran’s I as
\[ I = \frac{1}{\sum_{i \neq j} w_{ij}}\sum_i I_i. \]
Insights:
A high value for \(I_i\) suggests that the area is surrounded by areas with similar values. Such an area is part of a cluster of high observations, low observations, or moderate observations.
A low value for \(I_i\) indicates that the area is surrounded by areas with dissimilar values. Such an area is an outlier indicating that the observation of area \(i\) is different from most or all of the observations of its neighbors.
To interpret the local Moran’s I for each of the areas, it is
necessary to map \(I_{i}\) as well as
their expectations \(E[I_i]\),
variances \(Var[I_i]\), z-scores \(z_i\), and p-values \(\Pr(I_i<E[I_i]|...)\), \(\Pr(I_i>E[I_i]|...)\), etc.
Function localmoran {spdep} computes the local spatial statistic Moran’s I Ii along with expectations E.Ii, variances Var.Ii, Z.Ii, and p-values Pr().
Important arguments
x=, a numeric vector the same length as the neighbours list in listw
listw= a listw object created for example by nb2listw
alternative= a character string specifying the alternative hypothesis, must be one of greater, less or two.sided (default).
The following script loads the data set and generates the proximity matrix.
rm(list = ls())
# libraries
library(spdep)
library(mapview)
# get the data
NY8.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData")[1],
quiet=TRUE
)
NY8.map <- NY8.data
# build a neighbours list based on regions with contiguous boundaries
NY8.nb <- poly2nb(pl = NY8.data # list of polygons
)
# supplements a neighbours list with spatial weights for the chosen coding scheme (use style="B" for binary)
NY8.listw <- nb2listw(neighbours = NY8.nb, # an object of class nb
style = "B" # style can take values “W”, “B”, “C”, “U”, “minmax” and “S”
)
The following script computes the local Moran’s \(I_i\)’s and the associated values.
local.moran <- localmoran(x = NY8.data$Cases, # the values of the variable of interest as a numeric vector the same length as the neighbours list in listw
listw = NY8.listw, # a listw object created for example by nb2listw
alternative = "two.sided" # one of "greater", "less" or "two.sided"
)
print(local.moran[1:10,]) # print the first 10 locations
## Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
## 1 3.68358359 -6.894359e-03 1.887080671 2.68650332 0.007220423
## 2 3.95396484 -2.120315e-02 5.830630478 1.64625735 0.099710786
## 3 -2.27094989 -5.638398e-03 1.554533421 -1.81688691 0.069234432
## 4 -0.50530785 -3.925588e-03 1.090158176 -0.48020227 0.631083574
## 5 -0.31486694 -4.933370e-03 1.360313934 -0.26573517 0.790443174
## 6 -0.20470488 -5.902976e-03 1.627407268 -0.15583765 0.876160989
## 7 0.00410099 -9.210885e-07 0.000255115 0.25681374 0.797322562
## 8 -1.05108791 -1.570686e-02 4.349025547 -0.49648283 0.619553799
## 9 -0.01195552 -3.140586e-05 0.008571970 -0.12879108 0.897522965
## 10 0.16536285 -1.965356e-02 5.422076329 0.07945614 0.936669818
The following script visualizes the outputs by generating maps plotting local Moran’s index and z-scores against the location by using functions of the R package tmap.
library(tmap) # load the library
tmap_mode(
mode = "plot" # this is for printing the standard mode plot (not interactive)
# mode = "view" # this is for printing the interactive plot
) # set the mode to be printable (not the interactive)
# Create a fresh copy of the spatial dataset
NY8.map <- NY8.data
# variable of interest ####################
## tm_shape creates a tmap-element that specifies a spatial data object, which we refer to as shape.
p1 <- tm_shape(shp = NY8.map # shape object
)
## tm_polygons fills the polygons and draws the polygon borders.
p1 <- p1 + tm_polygons(col = "Cases", # the name of a data variable
title = "vble", # title
style = "hclust" # method to process the color scale when col is a numeric variable, here by clustering.
)
## tm_layout specifies the map layout; e.g. controls title, margins, aspect ratio, colors, frame, legend, among many other things.
p1 <- p1 + tm_layout(legend.outside = TRUE # determines whether the legend is plot outside of the map/facets.
)
# local Moran's I ##########
## append the local Moran's I in the sf data.frame
NY8.map$lmI <- local.moran[, "Ii"]
## tm_shape creates a tmap-element that specifies a spatial data object, which we refer to as shape.
p2 <- tm_shape(shp = NY8.map # shape object
)
## tm_polygons fills the polygons and draws the polygon borders.
p2 <- p2 + tm_polygons(col = "lmI", # the name of a data variable
title = "Local Moran's I", # title
style = "hclust" # method to process the color scale when col is a numeric variable, here by clustering.
)
## tm_layout specifies the map layout; e.g. controls title, margins, aspect ratio, colors, frame, legend, among many other things.
p2 <- p2 + tm_layout(legend.outside = TRUE # determines whether the legend is plot outside of the map/facets.
)
# local Moran's I Z-score ##########
## append the local Moran's I in the sf data.frame
NY8.map$lmZ <- local.moran[, "Z.Ii"]
## tm_shape creates a tmap-element that specifies a spatial data object, which we refer to as shape.
p3 <- tm_shape(shp = NY8.map # shape object
)
## tm_polygons fills the polygons and draws the polygon borders.
p3 <- p3 + tm_polygons(col = "lmZ", # the name of a data variable
title = "Z-score", # title
breaks = c(-Inf, 1.65, Inf) # method to process the color scale when col is a numeric variable, here by setting the breaks manually
)
## tm_layout specifies the map layout; e.g. controls title, margins, aspect ratio, colors, frame, legend, among many other things.
p3 <- p3 + tm_layout(legend.outside = TRUE # determines whether the legend is plot outside of the map/facets.
)
## Arrange small multiples in a grid layout.
tmap_arrange( p1, p2, p3)
Task
Consider the data set columbus{spData}, and particularly the variable CRIME.
Compute the local Moran’s \(I_i\)’s and the associated values.
Print the output as a table.
Visualize the outputs by generating maps plotting local Moran’s index and expectation against the location by using functions of the R package tmap.
rm(list=ls())
library(spData)
library(sf)
columbus.data <- st_read(
system.file("shapes/columbus.shp", package="spData"),
quiet=TRUE
)
columbus.map <- columbus.data
columbus.nb <- poly2nb(pl = columbus.data )
columbus.listw <- nb2listw(neighbours = columbus.nb,
style = "B"
)
#######################
# Write your code below
#######################
We focus on two approaches commonly used in practice:
SAR (Simultaneous AutoRegressive) model
CAR (Conditionally AutoRegressive) model
We present coding technicalities in the case that the response variable is continuous and the residuals are Gaussian however one can tweak suitable arguments of the discussed R functions to fit other spatial autoregressive models as well.
R function spautolm{spatialreg}
re-parametrises the Gaussian SAR model of lattice random field
\[
Z = \left(Z_{s};s\in\mathcal{S}\right)^{\top}
= \left(Z_{i}=Z\left(s_{i}\right);i=1,...n\right)^{\top}
\]
as \[
Z
= \mu + \lambda \check{A}\left(Z-\mu\right) + E
= \lambda \check{A}Z + \left(I- \lambda \check{A}\right)\mu + E
\] and \[\begin{align*}
Z &\sim \text{Normal},\\
\text{E}\left(Z\right) &= \mu,\\
\text{Var}\left(Z\right) & =\left(I-\lambda
\check{A}\right)^{-1}\Lambda\left(I-\lambda \check{A}^{\top}\right)^{-1}
\end{align*}\]
where consequently \(\lambda = 0\) implies no spatial autocorrelation.
In the lecture notes we used \[ A = \lambda \check{A} \]
Function spautolm{spatialreg} performs the computations and training for the spatial conditional and simultaneous autoregression models.
Important input arguments
formula= a symbolic description of \(\mu\) of the model to be fit. Similar to lm().
data= an optional data frame containing the variables in the model.
listw= a listw object created for example by nb2listw
family= character string: either “SAR” or “CAR” for simultaneous or conditional autoregressions
The following script loads the data and computes the proximity matrix as before
rm(list = ls())
library(spData)
# get the data
NY8.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData")[1],
quiet=TRUE
)
# create a working data set map
NY8.map <- NY8.data
# build a neighbours list based on regions with contiguous boundaries
NY8.nb <- poly2nb(pl = NY8.data)
# supplements a neighbours list with spatial weights for the chosen coding scheme (use style="B" for binary)
NY8.listw <- nb2listw(neighbours = NY8.nb, # an object of class nb
style = "B" # style can take values “W”, “B”, “C”, “U”, “minmax” and “S”
)
The following R script
fits a SAR model where the response variable \(Z\) is Z and the deterministic predictor is \[ \mu = \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} + \beta_{3}X_{3} \] where \(X_{1}\), \(X_{2}\), and \(X_{3}\) stand for PEXPOSURE, PCTAGE65P, and PCTOWNHOME.
prints a summary of the aforesaid produced output NY.sar.fit
library(spatialreg)
# 1
NY.sar.fit <- spautolm(formula = Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, #a symbolic description of the model to be fit. The details of model specification are given for lm()
data = NY8.data, # an optional data frame containing the variables in the model.
listw = NY8.listw, # a listw object created for example by nb2listw
family = "SAR" # character string: either "SAR" or "CAR"
)
# 2
summary(NY.sar.fit)
##
## Call:
## spautolm(formula = Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME, data = NY8.data,
## listw = NY8.listw, family = "SAR")
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.58367 -0.38764 -0.03025 0.34023 4.02902
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.598779 0.174905 -3.4234 0.0006183
## PEXPOSURE 0.065148 0.041639 1.5646 0.1176753
## PCTAGE65P 3.756876 0.623907 6.0215 1.728e-09
## PCTOWNHOME -0.429711 0.190201 -2.2592 0.0238683
##
## Lambda: 0.03573 LR test value: 4.2281 p-value: 0.03976
## Numerical Hessian standard error of lambda: 0.016968
##
## Log likelihood: -276.6148
## ML residual variance (sigma squared): 0.41606, (sigma: 0.64503)
## Number of observations: 281
## Number of parameters estimated: 6
## AIC: 565.23
Task
Comment on the following:
Is the spatial autocorrelation significant?
Is each of the regressor significant at sig. level 5%?
I observe that:
????
????
The following R script
# (1)
NY8.data$signal_trend <- NY.sar.fit$fit$signal_trend
mapview(NY8.data,
zcol = "signal_trend")
# (2)
NY8.data$signal_stochastic <- NY.sar.fit$fit$signal_stochastic
mapview(NY8.data,
zcol = "signal_stochastic")
Task
Consider the Columbus OH spatial analysis data set columbus {spData}.
The following R script loads the data and generates a proximity matrix as before
rm(list=ls())
columbus.data <- sf::st_read(system.file("shapes/columbus.shp", package="spData")[1])
## Reading layer `columbus' from data source
## `/usr/local/common/FC_64/lib/R/spData/shapes/columbus.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 49 features and 20 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 5.874907 ymin: 10.78863 xmax: 11.28742 ymax: 14.74245
## CRS: NA
columbus.map <- columbus.data
library(spdep)
columbus.np <- spdep::poly2nb(pl = columbus.map, # list of polygons
queen = TRUE # bounds
)
columbus.listw <- nb2listw(columbus.np,
style = "W"
)
Do the following
Fit a SAR model. Consider the response to be CRIME and the signal trend to be constant \(\mu = 1\).
Is the spatial autocorrelation significant?
Map the estimated stochastic signal against the locations.
# #####################
# Write your code below
# #####################
# 1
# #####################
# Write your code below
# #####################
#
# 2
# #####################
# Write your code below
# #####################
R function spautolm{spatialreg}
re-parametrises the Gaussian CAR model of lattice random field
\[
Z = \left(Z_{s};s\in\mathcal{S}\right)^{\top}
= \left(Z_{i}=Z\left(s_{i}\right);i=1,...n\right)^{\top}
\]
as
\[\begin{align*} Z_{i}|z_{\mathcal{S}-i} & \sim\text{N}\left(\mu_{i}+\lambda\sum_{j\ne i}b_{i,j}\left(Z_{j}-\mu_{j}\right),\kappa_{i}\right),\\ \text{E}\left(Z_{i}|Z_{\mathcal{S}-i}\right) &=\mu_{i}+\lambda\sum_{j\ne i}b_{i,j}\left(Z_{j}-\mu_{j}\right),\\ \text{Var}\left(Z_{i}|Z_{\mathcal{S}-i}\right) &=\kappa_{i} \end{align*}\]
where \(\lambda\) is a spatial autocorrelation parameter and \(\check{B}\) is a matrix that represents spatial dependence.
Consequently \(\lambda = 0\) implies no spatial autocorrelation.
In the lecture notes we considered \[ B = \lambda \check{B} \]
Recall that
\[
Z\sim\text{N}\left(\mu,\left(I-\lambda \check{B}\right)^{-1}K\right)
\]
provided that
\(K=\text{diag}\left(\left\{ \kappa_{i}\right\} \right)\) with \(\kappa_{i}>0\) , matrix \(B\) with \(B_{i,i}:=\left[B\right]_{i,i}=0\)
\(I- \lambda \check{B}\) is non-singular, and \(\left(I- \lambda \check{B}\right)^{-1}K>0\)
it is
\[ Z\sim\text{N}\left(\mu,\left(I-\lambda \check{B}\right)^{-1}K\right) \]
The following code loads the data and generates a proximity matrix, as before
rm(list = ls())
# get the data
NY8.data <- st_read(
system.file("shapes/NY8_utm18.shp", package="spData")[1],
quiet=TRUE
)
NY8.map <- NY8.data
# build a neighbours list based on regions with contiguous boundaries
NY8.nb <- poly2nb(pl = NY8.data)
# supplements a neighbours list with spatial weights for the chosen coding scheme (use style="B" for binary)
NY8.listw <- nb2listw(neighbours = NY8.nb, # an object of class nb
style = "B" # style can take values “W”, “B”, “C”, “U”, “minmax” and “S”
)
Task
Do the following
the response is \(Z\)
the regressors in the linear signal trend are PEXPOSURE, PCTAGE65P, and PCTOWNHOME
the spatial neiborhood matrix is NY8.listw
the weights (\(\kappa_{i}\)’s) are the column POP8 of the dataset NY8.data.
and name the output NY.car.fit.
#######################
# Write your code below
#######################
Task
Comment on the following:
Is the spatial autocorrelation significant?
Is each of the regressor significant at sig. level 5%?
Do the aforesaid the likelihood test results of \(\lambda\) for SAR(without weights) and CAR(with weights) give the same answer? Any ideas why this happens?
I observe that:
???
???
???
???